Back

Data in Brief

Elsevier BV

All preprints, ranked by how well they match Data in Brief's content profile, based on 13 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.

1
Quantifying the impact of sample, instrument, and data processing on biological signatures detected with Raman spectroscopy

Wiemann, J.; Heck, P. R.

2023-06-05 evolutionary biology 10.1101/2023.06.01.543279 medRxiv
Top 0.1%
8.5%
Show abstract

Raman spectroscopy is a popular tool for characterizing complex biological materials and their geological remains1-10. Ordination methods, such as Principal Component Analysis (PCA), rely on spectral variance to create a compositional space1, the ChemoSpace, grouping samples based on spectroscopic manifestations that reflect different biological properties or geological processes1-7. PCA allows to reduce the dimensionality of complex spectroscopic data and facilitates the extraction of relevant informative features into data formats suitable for downstream statistical analyses, thus representing an essential first step in the development of diagnostic biosignatures. However, there is presently no systematic survey of the impact of sample, instrument, and spectral processing on the occupation of the ChemoSpace. Here the influence of sample count, signal-to-noise ratios, spectrometer decalibration, baseline subtraction routines, and spectral normalization on ChemoSpace grouping is investigated using synthetic spectra. Increase in sample size improves the dissociation of sample groups in the ChemoSpace, however, a stable pattern in occupation can be achieved with less than 10 samples per group. Systemic noise of different amplitude and frequency, features that can be introduced by instrument or sample11,12, are eliminated by PCA even when spectra of differing signal-to-noise ratios are compared. Routine offsets ({+/-} 1 cm-1) in spectrometer calibration contribute to less than 0.1% of the total spectral variance captured in the ChemoSpace, and do not obscure biological information. Standard adaptive baselining, together with normalization, increase spectral comparability and facilitate the extraction of informative features. The ChemoSpace approach to biosignatures represents a powerful tool for exploring, denoising, and integrating molecular biological information from modern and ancient organismal samples.

2
Policies, practices, and experiences of European biobanks on sharing genomic biobank results with donors - a survey of BBMRI-ERIC biobanks

Brunfeldt, M.; Vrijenhoek, T.; Kaariainen, H.

2025-09-27 public and global health 10.1101/2025.09.25.25336629 medRxiv
Top 0.1%
6.4%
Show abstract

To study European biobanks policies, practices, and experiences on communicating individual research results to participants the EU Horizon 2020 Project Genetics Clinic of the Future performed two surveys in 2016 and 2020. First, a questionnaire was sent in 2016 (Survey I) to 351 European biobanks in 13 countries that were members of Biobanking and Biomolecular Resources Research Infrastructure - European Research Infrastructure Consortium (BBMRI-ERIC). We received replies from 72 biobanks (response rate 21%), representing each of the 13 BBMRI Member States. Respondents were mainly directors or heads of biobanks. To evaluate how the policies and practices of biobanks evolved over time, we also conducted another survey in 2020 (Survey II). The Survey I was implemented using a web based Webropol tool, and the Survey II was distributed by email. The biobanks had very different policies of sharing genomic data and the policies had changed over time. The percentage of biobanks with a policy to share results with participants if they so wish had increased between 2016-2020 from 36% to 45%. On the contrary, the percentage of biobanks with a policy to pro-actively re-contact the participants to share (some) results had decreased from 52% to 39%. Still in 2020, half of the biobanks had never shared results with participants.

3
Proteomic deconvolution reveals distinct immune cell fractions in different body sites in SARS-Cov-2 positive individuals

Okendo, J.; Okanda, D.

2022-01-23 health informatics 10.1101/2022.01.21.22269631 medRxiv
Top 0.1%
5.0%
Show abstract

BackgroundSevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) continues to be a significant public health challenge globally. SARS-CoV-2 is a novel virus, and what constitutes immunological responses in different human body sites in infected individuals is yet to be presented. We set to determine the various immune cell fractions in gargle solution, bronchoalveolar lavage fluid, nasopharyngeal, and urine samples post-SARS-CoV-2 infection in humans. Materials and methodsWe downloaded proteomics data from (https://www.ebi.ac.uk/pride/) with the following identifiers: PXD019423, n=3 (gargle solution), PXD018970, n=15 (urine), PXD022085, n=5 (Bronchoalveolar lavage fluid), PXD022889, n=18 (nasopharyngeal). MaxQuant was used for the peptide spectral matching using humans, and SARS-CoV-2 was downloaded from the UniProt database (Access date 9th January 2022). The protein count matrix was extracted from the proteins group file and used as an input for the cibersort for the immune cells fraction determination. ResultsThe body of individuals infected with the SARS-CoV-2 virus is characterized by different fractions of immune cells in Bronchoalveolar lavage fluid (BALF), nasopharyngeal, urine, and gargle solution. BALF has more abundant memory B cells, CD8, activated mast cells, and resting macrophages than urine, nasopharyngeal, and gargle solution. Our analysis also demonstrates that each body site comprises different immune cell fractions post-SARS-CoV-2 infection in humans. ConclusionDifferent body sites are characterized by different immune cells fractions in SARS-CoV-2 infected individuals. The findings in this study can inform public health policies and health professionals on treatment strategies and drive SARS-CoV-2 diagnosis procedures.

4
The Somalia Mortality Estimation Database (S-MED): a Birds Eye View of Mortality and its Determinants

Ratnayake, R.; Ouchtar, Y.; Abukar Ahmed, Y.; Hassan Mohamoud, J.; Jelle, M.; Seal, A.; Isse Dirie, N.; Palmer, J.; Checchi, F.

2025-09-08 public and global health 10.1101/2025.09.07.25335291 medRxiv
Top 0.1%
4.9%
Show abstract

Globally, there is a lack of consolidation and thus sharing of critical mortality estimates which can serve as an early warning of the severity of a humanitarian crisis. This lack of a comprehensive view may mask critical situations, and impact on which crises are more measured and visible to the humanitarian community. This ultimately affects the allocation of scarce humanitarian resources. Somalia is a country marked by recurrent drought, armed conflict, food insecurity and malnutrition. To facilitate real-time investigation of mortality rates, we developed the open-source, publicly available Somalia Mortality Estimation Database (S-MED) to visualise mortality estimates from retrospective surveys and surveillance systems in Somalia in real-time. This enables improved awareness through visualization of mortality estimates as well as the capacity to facilitate multisectoral analysis of determinants of mortality (i.e., drought, displacement, disease outbreaks) and higher-level analysis (i.e., crisis-wide analysis of mortality estimates and mortality forecasting). In this paper, we describe the mortality, morbidity, food insecurity, and environmental data contained in S-MED. We show how its mortality data can be used for improved detection of early warning signals and a more comprehensive public health interpretation of drought and armed conflict-driven health crises in 2018 and 2022. Similar mortality surveillance initiatives could be adapted to crisis-affected settings globally.

5
Applied Ontologies for Global Health Surveillance and Pandemic Intelligence

Baker, C. J. O.; Al Manir, M. S.; Brenas, J. H.; Zinszer, K.; Shaban-Nejad, A.

2020-10-20 health informatics 10.1101/2020.10.17.20214460 medRxiv
Top 0.1%
4.8%
Show abstract

Global health surveillance and pandemic intelligence rely on the systematic collection and integration of data from diverse distributed and heterogeneous sources at various levels of granularity. These sources include data from multiple disciplines represented in different formats, languages, and structures posing significant integration challenges This article provides an overview of challenges in data driven surveillance. Using Malaria surveillance as a use case we highlight the contribution made by emerging semantic data federation technologies that offer enhanced interoperability, interpretability and explainability through the adoption of ontologies. The paper concludes with a focus on the relevance of these technologies for ongoing pandemic preparedness initiatives.

6
Identification and quantification of α- and β-amanitin in wild mushrooms by HPLC-UV-EC and HPLC-DAD-MS detection

Barbosa, I.; Domingues, C.; Ramos, F.; Barbosa, R. M.

2022-03-11 pharmacology and toxicology 10.1101/2022.03.09.483521 medRxiv
Top 0.1%
4.4%
Show abstract

Amatoxins are a group of highly toxic peptides, which include - and {beta}-amanitin, found in several species of mushrooms (e.g. Amanita phalloides). Due to their high hepatotoxicity, they account for most deaths occurring after mushrooms ingestion. The determination of - and {beta}- amanitin content in wild mushrooms is invaluable for treating cases involving poisoning. In the present study, we have developed and validated an analytical method based on high-performance liquid chromatography, with in-line ultraviolet and electrochemical detection (HPLC-UV-EC), for the rapid quantification of - and {beta}-amanitin in wild mushroom samples collected from the Inner Center of Portugal. A reproducible and simple solid-phase extraction (SPE) using OASIS(R) PRIME HLB cartridges was used for sample pre-treatment, followed by chromatographic separation based on the RP-C18 column. The UV and EC chromatograms of - and {beta}-amanitin were recorded at 305 nm and +0.600 V vs. Ag/AgCl, respectively. The linear quantification for both amanitins was in the range of 0.5-20.0 g{middle dot}mL-1 (R2 > 0.999). The LOD, calculated based on the calibration curve, was similar for UV and EC detection (0.12-0.33 g ml.-1). Intra-day and inter-day precision were less than 13%, and the recovery ratios ranged from 89% to 117%. Nine Amanita species and five edible mushrooms were analysed by HPLC-UV-EC, and HPLC-DAD-MS confirmed the identification of amatoxins. We find high - and {beta}-amanitin content in A. phalloides and not in the other species analysed. In sum, the developed and validated method provides a simple and fast analysis of - and {beta}-amanitins contents in wild mushrooms and is suitable for screening and routine assessment of mushroom intoxication. HighlightsO_LINew validated method using HPLC-UV-EC to determine - and {beta}-amanitin in wild mushrooms. C_LIO_LIReproducible and fast SPE procedure for small samples. C_LIO_LIEffective sample pre-treatment with the OASIS(R) PRIME HLB SPE cartridge. C_LIO_LIIdentification and quantification of - and {beta}-amanitin in wild mushroom samples from Portugal. C_LIO_LIHPLC-DAD-MS confirmation of amatoxins present in mushroom samples. C_LI Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=109 SRC="FIGDIR/small/483521v1_ufig1.gif" ALT="Figure 1"> View larger version (26K): org.highwire.dtl.DTLVardef@734f7aorg.highwire.dtl.DTLVardef@66d9fforg.highwire.dtl.DTLVardef@720f39org.highwire.dtl.DTLVardef@458e64_HPS_FORMAT_FIGEXP M_FIG C_FIG

7
Food for the Empire: dietary pattern of Imperial Rome inhabitants.

de angelis, f.; Varano, S.; Gazzaniga, V.; Santangeli Valenzani, R.; Brancazi, L.; Facchin, G.; Lubritto, C.; Ricci, P.; Martinez-Labarga, C.; Rickards, O.; Catalano, P.; Battistini, A.; Di Giannantonio, S.

2020-01-24 evolutionary biology 10.1101/2020.01.23.911370 medRxiv
Top 0.1%
4.0%
Show abstract

This paper aims to provide a broad diet reconstruction for people buried in archaeologically defined contexts in Rome (1st-3rd centuries CE), in order to combine archaeological and biological evidence focusing on dietary preferences in Imperial Rome. A sample of 214 human bones recovered from 6 funerary contexts were selected for carbon and nitrogen stable isotope analysis. The baseline for the terrestrial protein component of the diet was set using 17 coeval faunal remains recovered from excavations at Rome supplemented by previously published data for the same geographic and chronological frames. {delta}13C ranges from -19.95{per thousand} to -14.78{per thousand}, whereas {delta}15N values are between 7.17{per thousand} and 10.00{per thousand}. The values are consistent with an overall diet mainly based on terrestrial resources. All the human samples rely on a higher trophic level than the primary consumer faunal samples. Certainly, C3 plants played a pivotal role in the dietary habits. However, C4 plants also seem to have been consumed, albeit they were not as widespread and were not always used for human consumption. The environment played a critical role also for Romans of lower social classes. The topographical location determined the preferential consumption of food that people could obtain from their neighborhood.

8
OpenChart-SE: A corpus of artificial Swedish electronic health records for imagined emergency care patients written by physicians in a crowd-sourcing project

Berg, J.; Aasa, C. O.; Appelgren Thorell, B.; Aits, S.

2023-01-05 health informatics 10.1101/2023.01.03.23284160 medRxiv
Top 0.1%
3.9%
Show abstract

Electronic health records (EHRs) are a rich source of information for medical research and public health monitoring. Information systems based on EHR data could also assist in patient care and hospital management. However, much of the data in EHRs is in the form of unstructured text, which is difficult to process for analysis. Natural language processing (NLP), a form of artificial intelligence, has the potential to enable automatic extraction of information from EHRs and several NLP tools adapted to the style of clinical writing have been developed for English and other major languages. In contrast, the development of NLP tools for less widely spoken languages such as Swedish has lagged behind. A major bottleneck in the development of NLP tools is the restricted access to EHRs due to legitimate patient privacy concerns. To overcome this issue we have generated a citizen science platform for collecting artificial Swedish EHRs with the help of Swedish physicians and medical students. These artificial EHRs describe imagined but plausible emergency care patients in a style that closely resembles EHRs used in emergency departments in Sweden. In the pilot phase, we collected a first batch of 50 artificial EHRs, which has passed review by an experienced Swedish emergency care physician. We make this dataset publicly available as OpenChart-SE corpus (version 1) under an open-source license for the NLP research community. The project is now open for general participation and Swedish physicians and medical students are invited to submit EHRs on the project website (https://github.com/Aitslab/openchart-se). Additional batches of quality-controlled EHRs will be released periodically.

9
A machine learning based approach to the segmentation of micro CT data in archaeological and evolutionary sciences

O'Mahoney, T.; McKnight, L.; Lowe, T.; Dunn, J.; Mednikova, M.

2019-11-30 evolutionary biology 10.1101/859983 medRxiv
Top 0.1%
3.8%
Show abstract

Segmentation of high-resolution tomographic data is often an extremely time-consuming task and until recently, has usually relied upon researchers manually selecting materials of interest slice by slice. With the exponential rise in datasets being acquired, this is clearly not a sustainable workflow. In this paper, we apply the Trainable Weka Segmentation (a freely available plugin for the multiplatform program ImageJ) to typical datasets found in archaeological and evolutionary sciences. We demonstrate that Trainable Weka Segmentation can provide a fast and robust method for segmentation and is as effective as other leading-edge machine learning segmentation techniques.

10
Proteomic cellular signatures of kinase inhibitor-induced cardiotoxicity: Mount Sinai DToxS LINCS Center Dataset

Xiong, Y.; Liu, T.; Chen, T.; Hansen, J.; Hu, B.; Chen, Y.; Jayaraman, G.; Schürer, S.; Vidovic, D.; Goldfarb, J.; Sobie, E. A.; Birtwistle, M. R.; Iyengar, R.; Li, H.; Azeloglu, E. U.

2020-02-26 pharmacology and toxicology 10.1101/2020.02.26.966606 medRxiv
Top 0.1%
3.8%
Show abstract

The Drug Toxicity Signature Generation Center (DToxS) at the Icahn School of Medicine at Mount Sinai is one of the centers of the NIH Library of Integrated Network-Based Cellular Signatures (LINCS) program. A key aim of DToxS is to generate both proteomic and transcriptomic signatures that can predict adverse effects, especially cardiotoxicity, of kinase inhibitors approved by the Food and Drug Administration. Towards this goal, high throughput shot-gun proteomics experiments (317 cell line/drug combinations + 64 control lysates) have been conducted at the Center for Advanced Proteomics Research at Rutgers University - New Jersey Medical School. Using computational network analyses, these proteomic data can be integrated with transcriptomic signatures generated in tandem to identify cellular signatures of cardiotoxicity that may predict kinase inhibitor-induced toxicity and possible mitigation. Both raw and processed proteomics data have been carefully screened for quality and made publicly available via the PRIDE database. As such, this broad protein kinase inhibitor-stimulated cardiomyocyte proteomic data and signature set is valuable for the prediction of drug toxicities. Links to: Metadata Tables O_TBL View this table: org.highwire.dtl.DTLVardef@1cda9f8org.highwire.dtl.DTLVardef@1520e05org.highwire.dtl.DTLVardef@16a35borg.highwire.dtl.DTLVardef@3ee6e7org.highwire.dtl.DTLVardef@1a98664_HPS_FORMAT_FIGEXP M_TBL C_TBL

11
Statistical and Evolutionary Analysis of Sequenced DNA from Breast Cancer FFPE Specimens

Kurpas, M. K.; Kus, P.; Jaksik, R.; Dinh, K. N.; Adamczyk, A.; Majchrzyk, K.; Kimmel, M.

2025-10-05 evolutionary biology 10.1101/2025.10.04.680485 medRxiv
Top 0.1%
3.7%
Show abstract

BackgroundDespite the introduction of instant freezing of tumor specimens, formalin-fixed paraffin-embedded (FFPE) blocks of tissue are still commonplace in clinical practice and constitute an important reference for genetic epidemiology of cancer. We carried out a study of a collection of breast tumors paired with lymph-node metastases and analyzed using advanced computational methods, to determine how much information can be obtained from mid-depth whole-exome bulk DNA sequencing. MethodsWe gathered 15 paired (primary and an involved lymph node) excised breast tumors of different molecular subtypes (HER2+, triple negative, luminal A and luminal B HER2-), from the National Research Institute of Oncology, Krakow (Poland) Branch. FFPE specimens contained typical artifacts, manifesting themselves in spurious DNA variant calls. We used several bioinformatics tools to remove the artifacts and analyzed the exomic data, using both commercial and original in-house computational techniques. ResultsWe used several of recent bioinformatics tools to remove the FFPE artifacts and found a serious dispersal of outcomes. After calibration, a series of analyses was performed, including copy number study, resulting in ploidy levels ranging from 1 to 5 (average of 2.5). Positive association was found between the frequency of oncogenes relative to tumor suppressor genes and DNA copy number. In addition, we carried out analyses of the clonal structure of the data using original computational methods based on evolutionary modeling. Interesting results concerning clonal structure, early tumor expansion, and interdependence of the primary tumor and lymph node metastases have been obtained. ConclusionsDespite the imperfections of the FFPE data, many important features of molecular evolution of tumor DNA can be recovered from routine clinical samples.

12
AI-Driven Science Communication: Leveraging LLMs and Knowledge Graphs for Seamless Knowledge Exchange

Schor, J.; Scheibe, P.

2025-07-07 pharmacology and toxicology 10.1101/2025.07.04.663152 medRxiv
Top 0.1%
3.7%
Show abstract

PurposeScientific knowledge is increasingly captured in structured formats, such as knowledge graphs, yet it remains largely inaccessible to non-technical users. We present EcoToxFred, a prototype conversational AI agent that enables intuitive, natural language access to curated environmental toxicology data. Designed to support users without programming expertise, EcoToxFred facilitates the exploration of complex datasets, such as chemical exposures and species-specific hazard information in European surface waters. MethodsEcoToxFred integrates a large language model (LLM) with a Neo4j graph database via a retrieval-augmented generation (RAG) architecture. The system employs a decision-making agent to interpret user queries, invoke appropriate tools, and translate natural language input into formal graph queries. Outputs are validated and returned in multiple formats, like text, tables, and interactive maps, and are grounded in structured, curated monitoring and hazard data. ResultsThe agent bridges the gap between human intent and formal data retrieval, enabling researchers, policy advisors, and stakeholders to pose complex, multi-step queries without prior training in query languages. By grounding LLM outputs in structured data, we demonstrate the systems ability to respond to diverse question types and deliver transparent, accurate, and context-aware results. EcoToxFred successfully answers broad and highly specific queries, bridging natural language input with formal data retrieval. ConclusionEcoToxFred represents a scalable and transferable framework for human-AI interaction in domain-specific contexts, combining natural language interfaces with structured data. By lowering access barriers to scientific knowledge, the system supports evidence-based decision-making and fosters responsible, human-centered AI use in environmental science and beyond.

13
MALDI mass spectrometry imaging of fresh and processed food: constituents, ingredients, contaminants and additives

Kokesch-Himmelreich, J.; Wittek, O.; Race, A. M.; Rakete, S.; Schlicht, C.; Busch, U.; Roempp, A.

2021-12-23 pharmacology and toxicology 10.1101/2021.12.23.473956 medRxiv
Top 0.1%
3.7%
Show abstract

Mass Spectrometry imaging (MS imaging) provides spatial information for a wide range of compound classes in different sample matrices. We used MS imaging to investigate the distribution of components in fresh and processed food, including meat, dairy and bakery products. The MS imaging workflow was optimized to cater to the specific properties and challenges of the individual samples. We successfully detected highly nonpolar and polar constituents such as beta-carotene and anthocyanins, respectively. For the first time, the distribution of a contaminant and a food additive was visualized in processed food. We detected acrylamide in German gingerbread and investigated the penetration of the preservative natamycin into cheese. For this purpose, a new data analysis tool was developed to study the penetration of analytes from uneven surfaces. Our results show that MS imaging has great potential in food analysis to provide relevant information about components distributions, particularly those underlying official regulations. HighlightsO_LIInvestigation of fresh and processed food by MALDI mass spectrometry imaging C_LIO_LIVisualization of different compound classes in plant and meat-based food C_LIO_LIDevelopment of data processing tool for penetration/diffusion analysis (in food) C_LIO_LINatamycin penetration in cheese, first visualization of food additive by MS imaging C_LIO_LIAcrylamide in gingerbread, first visualization of contaminant by MS imaging C_LI

14
Text Mining Approach to Analyze Coronavirus Impact: Mexico City as Case of Study

Chire Saire, J. E.; Pineda-Briseno, A.

2020-05-12 public and global health 10.1101/2020.05.07.20094466 medRxiv
Top 0.1%
3.6%
Show abstract

The epidemiological outbreak of a novel coronavirus (2019-nCoV or Covid-19) in China, and its rapid spread, gave rise to the first pandemic in the digital age. Derived from this fact that has surprised humanity, many countries started with different strategies in order to stop the infection. In this context, one of the greatest challenges for the scientific community is monitoring (real time) the global population to get immediate feedback of what is happening with the people during this public health contingency. An alternative interesting and affordable for the materialization of the aforementioned are the social networks. In a social network, the persons can act as sensors/information not only of personal data but also data derived from their behavior. This paper aims to analyze the publications of people in Mexico using a Text Mining approach. Specifically, Mexico City is presented as a case study to help understand the impact on society of the spread of Covid-19.

15
GenTIGS: A database empowering research and clinical insights on rare genetic disorders with an Indian perspective

Rashid, I.; S, P.; Moharir, S.; Mishra, R.

2025-04-01 genetic and genomic medicine 10.1101/2025.04.01.25325014 medRxiv
Top 0.1%
3.5%
Show abstract

Rare Genetic Diseases (RGDs) are conditions caused by gene mutations affecting less than 1 in 2,000 individuals, as per World Health Organization (WHO). India, being the most populous country in the world has a high prevalence of these diseases. The situation is worsened further due to the practice of consanguinity in several communities and limited genetic testing. Multiple recent developments, including advancements in genetic sequencing, precision medicine and patient advocacy supported by collaborative networks and government initiatives offer hope for improved diagnosis and treatment. With an objective to bring what is known about these rare disorders on a single platform for the researchers, clinicians and various stakeholders, we have developed a database, GenTIGS, that is a to go platform for information about these diseases. GenTIGS is a comprehensive database, that includes the information about the genes and pathogenic variants, clinical symptoms for in-depth exploration of RGDs with a focus on globally reported disorders, especially those prevalent in India. For facilitating ease of use, GenTIGS encompasses an array of features and data points crucial for researchers and clinicians delving into RGD domains, ensuring efficient information retrieval. This data delivery system provides information on 2315 RGDs and 2779 associated genes, including 707 globally reported disorders prevalent in India. It also includes details on 3525 clinical symptoms and 307340 pathogenic variants for these disorders. GenTIGS provides extensive data, comprehensive range of analytical tools and resources for researchers, clinicians, and academicians, facilitating in-depth exploration of genes and variants associated with rare genetic disorders and features supporting advancements in genetic medicine by enhancing understanding and analysis within the scientific and clinical community. Accessible: https://db.tigs.res.in/gentigs/

16
Visual Display Interface (VDI): A MATLAB Software Library For Simulating and Processing In-Vivo Magnetic Resonance Spectroscopy and Spectroscopic Imaging Data

Liu, Y.; Oeltzschner, G.; Ronen, I.; Schmidt, R.; Seginer, A.; Kirov, I. I.; Wu, D.; Zhang, Z.; Tal, A.

2023-08-31 radiology and imaging 10.1101/2023.08.31.23294888 medRxiv
Top 0.1%
3.5%
Show abstract

We describe a new comprehensive and open-source MATLAB library, called the Visual Display Interface (VDI), designed to process, model, quantify and visualize magnetic resonance spectroscopy (MRS) and spectroscopic imaging (MRSI) data. The library focuses on three major strengths: promoting reproducible research by creating comprehensive data logging and reporting and identifying outlier datasets; seamlessly combining spectral and spatial processing of both single and multivoxel data; and offering a modern, object-oriented design. VDI handles a wide range of common tasks, including spectral and spatial transforms, built-in spectral fitting, absolute quantification, and running density-matrix simulations for coupled and uncoupled spin systems to generate appropriate basis sets and aid in sequence design. VDI interfaces with the Statistical Parameteric Mapping (SPM) toolbox to carry out tissue segmentation, calculate tissue fractions within voxels, derive mean metabolite values from regions of interest defined by anatomical or functional atlases, and perform linear regression for global white matter and gray matter metabolite concentrations. The librarys workings are demonstrated for two tasks: (1) Pre-processing, fitting and analysis of single-voxel proton MRS data from healthy volunteers; and (2) Extracting region-specific metabolite concentrations from spectroscopic imaging data based on an existing cortical atlas in MNI space, and calculating average gray and white matter global concentrations using linear regression.

17
The effect of conflict on medical facilities in Mariupol, Ukraine: a quasi-experimental study

Poole, D. N.; Andersen, D.; Raymond, N. A.; Parham, J.; Howarth, C.; Hathaway, O. A.; Khoshnood, K.; Yale Humanitarian Research Lab,

2023-08-06 public and global health 10.1101/2023.08.01.23293508 medRxiv
Top 0.1%
3.4%
Show abstract

Medical facilities are civilian objects specially protected by international humanitarian law. Despite the need for systematic documentation of the effects of war on medical facilities for judiciary accountability, current methods for surveilling damage to protected civilian objects during ongoing armed conflict are insufficient. Satellite imagery damage assessment confers significant possibilities for investigating patterns of war. We leveraged commercially and publicly available satellite imagery and geolocated facility data to conduct a pre-post quasi-experimental study of damage to medical infrastructure in Mariupol, Ukraine as a result of Russias invasion. We found that 77% of medical facilities in Mariupol sustained damage during Russias siege lasting from February 24 - May 20, 2022. Facility size was not associated with damage, suggesting that attacks on medical facilities are not a residual of physical infrastructure characteristics. This is the first geographically comprehensive pre-post study of the effects of an ongoing conflict on specially protected medical infrastructure.

18
High-quality proteins and RNAs extracted from exact same samples for proteomics and RNA-Seq analyses

Fatou, M.; Kornobis, E.; Douche, T.; Druart, K.; Puchot, N.; Matondo, M.; Monot, M.; Bourgouin, C.

2026-01-19 molecular biology 10.64898/2026.01.16.699903 medRxiv
Top 0.1%
3.3%
Show abstract

Back to the 1990 the single step method developed by Chomczynski and Sacchi for RNA isolation was extended for sequential isolation of RNA, DNA and proteins from a same sample. Although the quality of the extracted RNA turned compatible with RNA-Seq analyses, the extraction of the protein fraction from the same sample was time-consuming and resulting in low yield and quality of proteins not compatible with LC-MS proteomic analyses. Here we report a novel procedure by isolating in parallel the protein fraction and the RNA fraction from the same exact minute mosquito samples. We provide evidence that each cognate fractions are compatible with LC-MS proteomic analysis on the one hand and RNA-Seq analysis on the other hand. This protocol is simple, time efficient and adequate for studies involving limited sample size and could be applied easily to a broad range of animal and human samples.

19
Evaluating Regional Diversity in Scientific Communication: A Comparative Analysis of COVID-19 Preprints and Peer-Reviewed Publications

Kim, D. H.; Jeon, K. L.; You, S. C.

2025-01-10 public and global health 10.1101/2025.01.04.25319994 medRxiv
Top 0.1%
3.2%
Show abstract

BackgroundThe unprecedented COVID-19 pandemic has triggered extensive global research, leading to an overwhelming surge in publications with surge of preprints. Despite the proliferation of preprints during the pandemic, the specific details of their implications for global diversity, along with their utility, remain underexplored. In this study, we assess the contribution of COVID-19 preprints in diverse aspects. MethodsWe collected COVID-19-related peer-reviewed papers and preprints from SCOPUS and MedRxiv, respectively, between December 2019 to November 2022. We analyzed four key aspects of scientific communication: 1) international co-authorship patterns using network analysis and eigenvector centrality, 2) publication patterns through relative ratio analysis comparing preprint to peer-reviewed paper counts, 3) social media dissemination through analysis of X (formerly Twitter) post quotations, and 4) citation impact by comparing citation counts between peer-reviewed papers with and without preprint history. All analyses were stratified by country income levels and geographical regions. ResultsNetwork analysis revealed higher co-authorship diversity in preprints, with Sub-Saharan Africa, Latin America, and the Caribbean showing 3.9 to 4.5 times higher eigenvector centrality compared to peer-reviewed papers. Countries with lower GDP showed significantly higher preprint publication ratios (correlation coefficient: -0.38, p-value < 0.001). Social media analysis demonstrated higher engagement with preprints, as evidenced by higher median numbers of social media quotations for preprints across all income groups. Peer-reviewed papers with preprint history received significantly higher citations (median: 10, IQR: 3-30) compared to those without (median: 5, IQR: 1-15, p-value < 0.001), particularly pronounced in low- and middle-income countries. ConclusionThis study demonstrates the significant role of preprints in advancing regional diversity in scientific communication during the COVID-19 pandemic. Our findings show enhanced international collaboration through preprints, particularly benefiting researchers from lower-income regions, higher social media engagement across income groups, and increased citation impact for papers with preprint history. These results highlight preprints as an important tool for promoting more equitable global scientific discourse.

20
COVID-19: a crash test for biomedical publishing?

Iourov, I. Y.; Zelenova, M. A.; Vorsanova, S. G.

2020-08-17 health informatics 10.1101/2020.06.13.20130310 medRxiv
Top 0.1%
3.1%
Show abstract

The effect of COVID-19 on biomedical publishing (BP) (i.e. scientific biomedical periodicals continuously published by research communities or commercial publishers) has not been deeply explored. To estimate the immediate COVID-19 impact on BP, we have assessed PubMed-indexed articles about COVID-19 (PMIAC) from December 2019 to April 2020. PMIAC have been classified according to publication date, country, and journals for evaluation of time-, region- and scientometric-dependant impact of COVID-19 on BP and have been curated manually (i.e. each entry has been individually analyzed). PMIAC analysis reflects geographic and temporal parameters of outbreak spread. A major BP problem is related to the fact that only 40% of articles report/review/analyze data. Another BP weakness is the clusterization of "highly-trusted" publications according to countries of origin and "highly impacting" journals. Finally, a problem highlighted by COVID-19 crisis is the increased specification of biomedical research. To solve the problem, analytical reviews integrating data from different areas of biology and medicine are required. The data on PMIAC suggest priority of "what is published" over "where it is published" and "who are the authors". We believe that our brief analysis may help to shape forthcoming BP to become more effective in solving immediate problems resulted from global threats.